Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition
نویسندگان
چکیده
Lemmatization for the Sumerian language, compared to the modern languages, is much more challenging due to that it is a long dead language, highly skilled language experts are extremely scarce and more and more Sumerian texts are coming out. This paper describes how our unsupervised Sumerian named-entity recognition (NER) system helps to improve the lemmatization of the Cuneiform Digital Library Initiative (CDLI), a specialist database of cuneiform texts, from the Ur III period. Experiments show that a promising improvement in personal name annotation in such texts and a substantial reduction in expert annotation effort can be achieved by leveraging our system with minimal seed annotation.
منابع مشابه
Unsupervised Sumerian Personal Name Recognition
This paper describes an unsupervised named-entity recognition (NER) system to identify personal names in Sumerian cuneiform documents from the Ur III period. We are motivated by the needs of social and economic historians of that period to identify specific persons of importance and such historically relevant facts as can be discerned by the surviving texts. The work was confronted by the chall...
متن کاملLemmatization of Multi-word Common Noun Phrases and Named Entities in Polish
In the paper we present a tool for lemmatization of multi-word common noun phrases and named entities for Polish called PoLem1. The tool is based on a set of manually crafted rules and heuristics utilizing a set of dictionaries (including morphological, named entities and inflection patterns). The accuracy of lemmatization obtained by the tool reached 97.99% on a dataset with multi-word common ...
متن کاملNamed Entity Recognition for Highly Inflectional Languages: Effects of Various Lemmatization and Stemming Approaches
In this paper, we study the effects of various lemmatization and stemming approaches on the named entity recognition (NER) task for Czech, a highly inflectional language. Lemmatizers are seen as a necessary component for Czech NER systems and they were used in all published papers about Czech NER so far. Thus, it has an utmost importance to explore their benefits, limits and differences between...
متن کاملEnhancing Medical Named Entity Recognition with Features Derived from Unsupervised Methods
Creating the annotated corpus for training a named entity recognition model is expensive, particularly in specialised domains, such as medicine, which require expert annotators. Moreover, a model trained on text from one medical sub-domain often shows a drop in performance when applied on texts from another sub-domain, and annotated text from this other sub-domain might be required. When incorp...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کامل